Moving beyond linearity

10/29/2019

Agenda

Polynomial regression
Step functions
Regression splines
Smoothing splines
Generalized additive models

Recap

Linear B-splines

Cubic B-splines

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Example of fit

\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]

Regression splines continued…

Natural splines

Splines can have high variance at the outer range of the predictors (especially when looking at the confidence bands)
A natural spline is a regression spline with additional boundary constraints: the function is required to be linear at the boundary (in the region where \(X\) is smaller than the smallest knot, or larger than the largest knot)
For cubic B-splines, this adds \(4 = 2 \times 2\) extra constraints, and allows us to put more internal knots for the same degrees of freedom as a regular cubic spline
In R, you just need to change bs to ns!

Natural splines

Choosing number and locations of the knots

One strategy is to decide \(K\), the number of internal knots, and then place them at appropriate quantiles of the observed \(X\)
A default choice is to add knots at the boundaries (total knots = \(K+2\))
Given \(K\) internal knots, there are \(K+1\) subintervals and \(d + K + 1\) degrees of freedom (\(\beta_0, \beta_1, \dots, \beta_{d+K}\))
A cubic spline with \(K\) internal knots has \(K + 4\) parameters or degrees of freedom
A natural spline with \(K\) internal knots has \(K\) degree of freedom

Fix a degree \(d\), and use cross-validation to choose the number of knots!

Splines for classification

Splines can also be used when the response variable is qualitative. For example, consider the logistic regression model

\[\log \left( \frac{p}{1-p} \right) = f(x) = \sum_{k=0}^{K + d} \beta_k b_k(x)\]

Once the basis functions have been defined, we just need to estimate coefficients \(\beta_k\) using a standard logistic regression procedure.

A smooth estimate of the conditional probability \(P(Y = 1 \mid x)\) can then be used for classification.

Linear logistic regression

Flexible logistic regression

Smoothing splines

Consider this criterion for fitting a smooth function \(g(x)\) to some data:

\[\text{argmin}_{g \in \mathbb{S}} \left\{ \sum_{i=1}^n (y_i - g(x_i))^2 + \lambda \int g^{\prime \prime} (t)^2 dt \right\}\]

The first term is the RSS, and tries to make \(g(x)\) match the data at each \(x_i\)
The second term is a roughness penalty and controls how wiggly \(g(x)\) via the tuning parameter \(\lambda \geq 0\):
- The smaller \(\lambda\), the more wiggly the function, eventually interpolating \(y_i\) when \(\lambda = 0\)
- As \(\lambda \rightarrow +\infty\), the function \(g(x)\) becomes linear

Why second derivative?

Derivative of a function: slope of tangent line at each point
Second derivative of a function: change in slope of tangent line at each point

Choosing \(\lambda\)

The solution is a natural cubic spline, with a knot at every unique value of \(x_i\). The penalty still controls the roughness via \(\lambda\)
As \(\lambda\) increases from \(0\) to \(+\infty\), the effective degrees of freedom \(\text{df}(\lambda)\) decrease from \(n\) to \(2\)
\(\lambda\) should be chosen via cross-validation
In R: smooth.spline(X, Y, df = 10)

Generalized Additive Models

Allows for flexible nonlinearities in several variables, but retains the additive structure of linear methods: we calculate a separate \(f_j\) for each \(X_j\), and then add together all of their contributions.

\[y_i = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \dots + f_p(x_{ip}) + \varepsilon_i\]

The non-linear fits can potentially make more accurate predictions for the response \(Y\)
Because the model is additive, we can still examine the effect of each \(X_j\) on \(Y\) individually while holding all of the other variables fixed
The main limitation of GAMs is that the model is restricted to be additive: with many variables, important interactions can be missed

GAM for regression

You can use smoothing splines, B-splines or natural splines. You can also mix terms - some linear, some nonlinear, e.g.

gam(mpg ~ ns(horsepower, df = 5)+ns(acceleration, df = 5)+year)

Coefficients are not that interesting; fitted function values are
GAMs are additive, although low-order iterations can be included in a natural way using interactions of the form

gam(mpg ~ ns(horsepower, df = 5) : ns(acceleration, df = 5))

Generalized Additive Models

Holding acceleration and manufacturing year fixed, fuel efficiency tends to decrease with horsepower

GAM for classification

\[\log\left( \frac{p}{1-p} \right) = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \dots + f_p(x_{ip})\]

Agenda

Recap

Linear B-splines

Cubic B-splines

Example of fit

Example of fit

Example of fit

Example of fit

Example of fit

Example of fit

Regression splines continued…

Natural splines

Natural splines

Choosing number and locations of the knots

Splines for classification

Linear logistic regression

Flexible logistic regression

Smoothing splines

Smoothing splines

Why second derivative?

Choosing \(\lambda\)

Generalized Additive Models

Generalized Additive Models

GAM for regression

Generalized Additive Models

GAM for classification

Question time